Abstract: The aim of authorship attribution is to identify the author of an anonymous document. Earlier, many types of research used authorship attribution as a multi-class single labeled text classifier problem. However, in several applications, it is neither easy nor possible to find such labeled data so it is necessary to build unsupervised attribution models that are able to estimate similarities or differences in personal style of authors. The present paper experiments authorship clustering using morpheme-based N-gram on unsupervised clustering algorithms like K-means, Mini Batch K-means, and Ward Hierarchial clusterings. The performance of the clustering algorithms is evaluated using silhouette coefficient and calculated B-cubed F-score and found that K-means algorithm achieves better clustering performance on C 50 news groups data set.
Keywords: morphemes; authorship clustering; silhouette coefficient; BCubed F-score.